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Abstract 

Hansen (1982) proposed a class of generalized method of moments (GMMs) for esti- 
mating a vector of regression parameters from a set of score functions. Hansen estab- 
lished that, under certain regularity conditions, the estimator based on the GMMs 
is consistent, asymptotically normal and asymptotically efficient. In the generalized 
estimating equation framework, extending the principle of the GMMs to implicitly 
estimate the underlying correlation structure leads to a quadratic inference function 
(QIF) for the analysis of correlated data. The main objectives of this research are 
to (1) formulate an appropriate estimated covariance matrix for the set of extended 
score functions defining the inference functions; (2) develop a unified large-sample 
theoretical framework for the QIF; (3) derive a generalization of the QIF test statis- 
tic for a general linear hypothesis problem involving correlated data while establishing 
the asymptotic distribution of the test statistic under the null and local alternative 
hypotheses; (4) propose an iteratively reweighted generalized least squares algorithm 
for inference in the QIF framework; and (5) investigate the effect of basis matrices, 
defining the set of extended score functions, on the size and power of the QIF test 
through Monte Carlo simulated experiments. 

Key Words: Covariance structure; Extended score; Generalized estimating equations; 
Generalized least squares; Generalized method of moments; Longitudinal data; Quasi- 
likelihood. 
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1 Introduction 



Correlated data arises when a response is measured at repeated instances on a set 
of subjects within a study design. A canonical problem is to determine a regression 
relationship between the measured responses and a set of covariates. Responses on 
different subjects are assumed to be independent, while the repeated measurements 
on individual subjects are correlated with an unknown correlation structure. Any 
inferential procedure must take account of this correlation (Crowder and Hand, 1990; 
Diggle et al, 1994). 

Let Y it be a response, with a corresponding g-dimensional vector of covariates Xj t , 
measured at the tth (t = l,...,rij) time point on the ith (i = 1,...,N) subject. 
Assuming a generalized linear model for Yu and X# yields 



where (3 G B is a g-dimensional vector of unknown regression parameters, is a 
dispersion parameter, h( • ) is a known inverse link function and v( • ) is a known vari- 
ance function. The principal objective is to derive inferential theory for the unknown 
parameter vector f3. 

For simplicity of exposition, we assume that each subject is observed at a common 
set of time points t = 1, . . . , n. Let Yj = (Yu, . . . , Y in ) T be the response vector and 
^ = E(Yi) = {h(Xj 1 f3),...,h(Xj n f3)} be the vector of means. Furthermore, let 
the operator V denote a partial derivative with respect to the elements of f3 so that 
Vhj represents the (n x q) matrix (dh i /df3 l , . . . , dhi/df3 q ) for each i = 1, . . . , N. 




(1) 



and 



v a i(Y u ) = <Pv(Xl(3), 



(2) 
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1.1 Background 



The quasi-likelihood estimating equation (Wedderburn, 1974) for (3, under the gen- 
eralized linear model framework is defined as 

N 

s((3) ■.= {vhf wr 1 (Yi -h t )} = 0, (3) 

i=l 

where Wj = cov(Yj) is an (n x n) diagonal matrix with elements determined by 
the variance function (2). When the independence assumption within a subject for 
the responses is relaxed, the matrices Wj are no longer diagonal, instead have an 
unknown correlation structure that needs to be incorporated into the model. In 
a seminal article, Liang and Zeger (1986) proposed generalized estimating equations 
(GEEs), based on the ingenious idea of using a working correlation matrix with a 
nuisance parameter vector to simplify Wj. In particular, they proposed the GEEs 
based on 

W, = A]' 2 R(a) Aj 12 for i = 1 , . . . , N, 

where Aj is the diagonal matrix of marginal variance of Yj and R(a) is the working 
correlation matrix with an unknown nuisance parameter vector a. Specific choices 
of R(a) correspond to common correlation structures, such as the exchangeable and 
AR-1. 

The GEE approach yields a consistent estimator of (3 even when the working cor- 
relation structure R(a) is misspecified. However, under such misspecification, the 
GEE estimator of (3 is not efficient. Furthermore, Crowder (1995) established that 
there are difficulties with estimating the nuisance parameter vector ck in the GEE 
framework and that in certain cases, the estimator of ck does not exist. 

Hansen (1982) proposed the class of generalized method of moments (GMMs) for es- 
timating the vector of regression parameters from a set of score functions, where the 
dimension of the score function exceeds that of the regression parameter. Hansen es- 
tablished that, under certain regularity conditions, the GMM estimator is consistent, 
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asymptotically normal and asymptotically efficient. To overcome the difficulties as- 
sociated with the GEEs, Qu et al. (2000) (henceforth abbreviated QLL) applied the 
principle of the GMMs in the GEE framework that implicitly estimates the underly- 
ing correlation structure: In particular, they proposed a clever approach based on the 
quadratic inference functions (QIFs) (1) to estimate the working correlation struc- 
ture R(ck) such that the resulting estimator always exists and (2) to obtain better 
efficiency in estimating f3 within the assumed family even under the misspecification 
of R(a). 

1.2 Main Results 

In this article, we derive a unified large-sample theoretical framework for the QIF. 
The cornerstone of our theory is stated in Theorem 5 which establishes a uniform 
quadratic approximation to the QIF surface in a neighborhood of /3 , the true re- 
gression parameter vector. This result has two fundamental consequences. First, it 
provides the necessary machinery to establish that the QIF is asymptotically equiv- 
alent to the generalized least squares criterion. This leads to techniques for deriving 
large-sample results for the QIF estimators and test statistics, analogous to stan- 
dard inferential theory for the generalized least squares methods. Second, it provides 
a flexible algorithm for finding the QIF estimators. Building on the quadratic ap- 
proximation to the QIF, we create an iteratively reweighted generalized least squares 
(IRGLS) algorithm for estimation and testing in the QIF framework. This algorithm 
is stable and computationally more feasible than the Newton-Raphson algorithm rec- 
ommended in the existing QIF literature (Figure 1 demonstrates the necessity for the 
IRGLS algorithm). 

The QLL article has made important contributions to the analysis of correlated data. 
However, some of the asymptotic results in QLL are flawed, including the proof of 
their Theorem 1. QLL employ a Taylor series expansion in proving the Theorem 1; 



4 



however, there is a fundamental problem since the Taylor series expansion is in terms 
of the partitioned parameter vector /3 = (?/>, A) and not in terms of the sample size N. 
Even with the remainder terms, Taylor's theorem does not provide any statements 
regarding the behavior of the error term as a function of the sample size N and neither 
has this been established in QLL. It is not possible to patch up either the original 
proof or other asymptotic results in the QLL article (the reasons will be made clear 
in later sections). These difficulties have not been identified explicitly; hence, there 
is not yet any rigorous development of a unified asymptotic theory for the QIF. We 
develop a more broadly applicable inferential theoretical framework for the QIF that 
extends and corrects the results of QLL. 

As a first step in achieving the objectives, we re-formulate the estimated covariance 
matrix of the "extended score functions" defining the QIF. This formulation yields 
a consistent estimator of the covariance matrix, which in turn lays the foundation 
for deriving valid inferential theory for the QIF. The main results are summarized as 
follows: 

1. We formulate an appropriate estimated covariance matrix for the set of ex- 
tended score functions defining the QIF (Section 1.3). In Sections 2 and 3, we 
first formulate a unified large-sample theoretical framework for the QIF and 
next derive several important asymptotic properties for the QIF. These lay the 
necessary foundation for the development of the asymptotic results derived in 
later part of the article. 

2. In Section 4, we first derive the principal result, the quadratic approximation 
to the QIF surface in a neighborhood of f3 , the true regression parameter 
vector. Next, we formulate a statistic based on the QIF for testing general linear 
hypotheses involving correlated data. Building on the quadratic approximation 
to the QIF, we establish the asymptotic distribution of the generalized QIF test 
statistic under both the null and local alternative hypotheses. 

3. In Section 5, we propose a stable and computationally feasible IRGLS algorithm 
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for estimating (3 in the QIF framework. This algorithm is a step in the direction 
of developing a unified framework for estimation, testing and model selection 
for correlated data within the QIF setting. 

4. In Section 6, we illustrate the methods using a benchmark dataset consisting 
of the correlated binary data measuring the respiratory health effects of indoor 
and outdoor air pollution. 

5. In Section 7, we investigate the effect of the basis matrices (defining the set of 
extended score functions) on the size and power of the QIF test through Monte 
Carlo simulation experiments from Bernoulli and Gaussian distributions. 

This article derives asymptotic theory for testing general linear hypotheses based on 
the quadratic approximation of the QIF. However, a common thread underlying the 
recent literature in the context of nonlinear testing problems is in fact the Theorem 5: 
(1) Pilla et al. (2005) derived asymptotic distribution of the test statistic for order- 
restricted hypothesis testing problem; (2) Pilla (2005) developed inferential theory 
for testing under the general convex cone alternatives for correlated data; and (3) 
Loader and Pilla (2005a) derived several properties of the IRGLS algorithm which is 
more generally applicable while providing a flexible technique for estimation, testing 
and model selection with correlated data. 

Having the correct asymptotic theory for the QIF is essential for further extensions 
as well as applications of the QIF, especially given that the QIF is elegant, simple 
and practical to implement with the proposed IRGLS algorithm for the analysis of 
correlated data. 

1.3 The Quadratic Inference Function 

QLL showed that the principle of GMMs can be applied in the GEE framework by 
implicitly estimating the underlying correlation structure. In particular, they assumed 
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that the inverse of the working correlation matrix R(a) can be expressed as a linear 
combination of pre-specified basis matrices M 1; . . . , M s such that 



R- 1 (a) = ^a J M„ 



(4) 



where a±, . . .,a a are unknown constants. For this article, we choose Mi as the iden- 
tity (of appropriate dimension) and M2, • • • , M s according to the form of the assumed 
underlying correlation structure. For example, (1) if R(ck) is an exchangeable cor- 
relation matrix, then s = 2 and M 2 is a matrix of Is; and (2) if R(a) is an AR-1 
correlation matrix, then s — 3, M2 takes 1 on the two main off diagonals and zero 
elsewhere and M3 takes 1 at the elements (1,1) and (n,n) and zero elsewhere. In 
general, M 3 is a minor boundary correction and can be omitted. If the covariate is 
time- independent, then the boundary correction does not have an effect on the infer- 
ence since the corresponding components of the score vector (see equation (5 below) 
are linearly dependent on other terms. 

The quasi-score estimating equations in (3) can be expressed, under the representation 
(4), as 



Vhf AT 1 / 2 («i Mi + • ■ • + a s M s ) A~ 1/2 (Y, - h, 



8=1 



These estimating equations are linear combinations of elements of a set of extended 
score functions 



1 N 

i=l 



where a set of "subject-specific" basic score functions is defined as 



ft 03) 



Vhf A7 l/2 Mi A" 1/2 (Yi 



> for i 



vhf a; 1/2 m s a; l/2 (Y. t - h 



1/2 



N. 



(5) 



Instead of directly estimating the parameters a\, . . . , a s , the QIF introduces a sample 
covariance matrix in order to combine the score functions in an optimal manner. In 



general, the equation g N ((3) = has no solution, since its dimension is greater than 
the number of unknown parameters. Instead, the parameter vector (3 is estimated by 
minimizing the QIF defined as 

Q N (0) := N g£(/3) C N \(3) g N ((3), (6) 

where an estimator of the second moment matrix of gi(/3) is 

1 N 

Cn((3):=-J2^sI((3). (7) 

i=i 

If the extended score vector g N ((3) has mean zero, then A r_1 Cjv(/3) is an estimator of 
the covariance matrix of g N ((3). The function Qn((3) measures the size of the score 
vector relative to its covariance matrix and large values of Qn((3) can be considered 
as an evidence against a particular value of f3. In this sense, the QIF plays a role 
similar to the negative of the loglikelihood in parametric statistical inference. In par- 
ticular, one can construct a goodness-of-fit test statistic Qn((3) for testing the model 
assumption in (1). It follows from the results of Hansen (1982) that the asymptotic 
distribution of Qn((3) is x 2 with (dim(g) — dim(/3)} degrees of freedom under the 
model assumption. 

The second moment matrix estimator defined in (7) is an average and hence under 
certain regularity conditions it will converge to the true covariance of gi(/3) as N — > 
oo. This convergence result is fundamental to adapting the large-sample framework 
of the GMMs (Hansen, 1982) to the QIF setting. The role of the matrix C^ 2 {(3) is 
similar to that of the matrix defined on p. 1040 of Hansen (1982), which is also 
required to converge to a non-degenerate limit. 

Our covariance estimator Cjy(/3) differs from that of QLL, who define a covariance 
with a factor of N~ 2 and correspondingly omit the factor of N from the QIF 
in (6). This has led to a number of imprecise claims in QLL, centered around their 
statement on p. 829 that Cat converges to E(Cn). In fact, their Cat converges to 
zero. Furthermore, the asymptotic result for (3 N in Section 3.6 of QLL is incorrect 
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since Cat in equation (8) of their article is approaching zero as N — » oo and is not 
a consistent estimator of (/3), the true covariance matrix of gi(/3) evaluated at 
(3 Q . However, our definition of the QIF matches that presented by Park and Lindsay 
(1999). 

While the correlation model (4) motivates our choice of the score vector, the funda- 
mental property E{gi(f3 )} = holds whether or not the covariance assumption is 
correct. Similarly, Ctv(/3o) consistently estimates N cov{g Ar (/3 )}. Therefore, infer- 
ence based on the QIF is semiparametric, in the sense that procedures are asymp- 
totically valid whether or not the covariance model is correct. Bickel et al. (1998) 
and Kosorok (2006) present a detailed exposition of the mathematical aspects of 
semiparametric inference. 

2 Large-Sample Properties of the Extended Score 
Functions 

In order to establish the asymptotic results in any regression problem, one must first 
state assumptions regarding the behavior of the design matrices as the sample size N 
increases. Without formulating such assumptions, it is not possible to establish even 
the consistency of the QIF estimators. However, such assumptions are missing from 
the earlier QIF work. 

The main requirement for our asymptotic theory is to be able to apply the strong 
law of large numbers to show that g N ((3), Cn((3) and other averages converge to 
appropriate non-degenerate limits. A sufficient condition is the following "random 
design" assumption. 

Assumption Al. The pairs (Yj, "Xj ) are assumed to be an independent sample 
from a {n x (q + l)}-dimensional distribution F, where Xj = (Xa, • • • , X* n ) is the 



9 



(q x Tridimensional design matrix for the zth (z = 1, . . . , N) subject. 

Remark 1. The independence part of Assumption Al is between different sub- 
jects, or with respect to the index i. The elements of Xj need not be independent 
of each other; hence, this assumption incorporates both time-dependent and time- 
independent covariates. Note that there exists a dependence of Yj on Xj through the 
link and variance functions in (1) and (2), respectively. 

All throughout this article, (•) denotes the expectation operator with respect to 
the true regression parameter vector f3 . Our results are based on the implicit as- 
sumption that all expectations are finite and the convergence statements are uniform 
for (3 in bounded sets. Uniformity results for the strong law of large numbers are 
derived by Rubin (1956). 

Theorem 1. [Asymptotic normality ofg N ((3 )] Let (3 be the true parameter vector. 
Under Assumption Al, 

g N (/3) ^ E^{ gl (f3)} = if f3 = /3 

and 

VNb n (/3 ) + N r {0,Hp o (p o )}, (8) 
where r = qs and (/3 ) is the true covariance matrix of gi(/3) evaluated at (3 . 

It is easy to verify that E^ {gi(/3 )} = 0. The following identifiability assumption is 
required to develop the large-sample theory for the QIF. 

Assumption A2. The parameter (3 is estimable, in the sense that Ep {gi(/3)} = 
if and only if (3 ^ /3 . 

Another application of the strong law of large numbers establishes that Cjy(/3) con- 
verges to its expected value, a non-degenerate limit, which is required to invoke the 
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results of Hansen (1982). 

Theorem 2. [Consistency of Cjv(/3)] Under Assumptions Al and A2, 

C N (J3)^Ep o { gl (P)ig<J3)}:=i:p o (J3) as iV-+oo. (9) 

In this article we follow the prescription of Assumption 3.6 of Hansen (1982) which 
is restated for the current framework. 

Assumption A3. The matrix (/3) is strictly positive definite. 

Remark 2. The estimator of second moment matrix Cj\r(/3) may be singular. How- 
ever, any vector in the null space of Cjv(/3) must be orthogonal to each of the subject- 
specific score functions gj(/3) (i = 1, . . . , N) and consequently to g N ((3). As a result, 
one can replace Cj f 1 (f3) by any generalized inverse such as the Moore- Penrose gener- 
alized inverse. 

We first state several important assumptions required to establish the large-sample 
properties ofg N ((3). 

Assumption A4- The parameter space of (3 denoted by B is compact. 

Assumption A5. The expectation Ep {g~Ar(/3)} exists and is finite for all (3 £ B and 
is continuous in (3. 

The compactness assumption is necessary to invoke the uniformity results of Rubin 
(1956) and it is unavoidable since the QIF surface is often not convex. For non- 
compact parameter spaces, Lemma 1 is only applicable to a sequence of local minima. 

From Theorem 2.1 of Hansen (1982), the following result holds. 

Lemma 1. [Consistency of (3 N ] Under Assumptions A4-A5, the QIF estimator 

P N := arg min Q N {(3) 
(3ets 

exists and f3 N (3 as N — > oo. 
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Our next goal is to derive the asymptotic distribution of (3 N . Let 

D(/3) :=E /3o{^ gl(/3) } = ^o {Vgl(/3)} - 
Once again, from the strong law of large numbers, it follows that 

Vg N (f3) ^ E^{V gl ((3)} = D(/3). (10) 

The extended score vector g N (f3) is a random vector and hence Vg N ({3) is a random 
matrix. Therefore, the claims on p. 829 of QLL that Vg N (/3) is nonrandom and 
E{Vg N ((3)} = gjv(/9) cannot be true. 

Assumption A6. The subject-specific score functions gj(/3) {i = 1,...,N) have 
uniformly continuous second-order partial derivatives with respect to the elements of 
the vector (3. 

Owing to Theorems 3.1 and 3.2 of Hansen (1982), the following result holds. 

Theorem 3. [Asymptotic normality of (3 N ] Under Assumptions A1-A6, the asymp- 
totic distribution of (3 N is 

VN (p N - (3 ) ^ N q {0,J~\[3 )} (11) 

as N — > oo, where 

J(/3 ) = D T (/3 )S, 1 (/3 )D(/3 ). (12) 



The claim by QLL on p. 835, below equation (Al), that (f3 N — (3 ) converges in law to 
a normal distribution is incorrect. In fact, y/N({3 N — (3 ) has an asymptotic normal 
distribution while (f3 N — f3 ) converges in probability to zero. The matrix do is not 
defined and it is not possible to define this in a manner consistent with the remainder 
of their article. While the statement of Theorem 1 in QLL requires the non-centrality 
parameter (consequently the partitioned matrices etc., d and S) to be 0(1), 
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the equation (Al) on p. 835 requires J^,^, to be O p (N 1 ). We believe, based on 
comparison with other work on the QIF, that the authors probably intended to write 
d = E{Vg N ((3)} = O p (l) and £ = cov{g Ar (/3)} = C^iV" 1 ). However, this means 
that the statement of their Theorem 1 is incorrect. 



3 Fundamental Results for the Quadratic Infer- 
ence Functions 

In this section, we establish several fundamental results for the QIF which lay the 
foundation for deriving the asymptotic distribution of inference functions presented 
in the next section. 

One main focus is on the vector VQn{(3) of partial derivatives and matrix V 2 Qn((3) 
of second-order partial derivatives. Along the way, we derive the correct versions of 
several claims made by QLL. For example, on p. 830, QLL incorrectly claim that 
V 2 Qn{(3) converges in probability. In fact, the second derivative matrix has size 
O p (N). 

Assumption Al. The first and second-order partial derivatives of g N (/3) and Cjy(/3) 
have finite means. 

Theorem 4. Under Assumption A7, 

-^7= VQ N (f3 ) -±> N q {0, J(/3 )} (13) 

as iV — > oo. There exists a non-random matrix V((3), continuous in (3, such that 

lvU(/3)=V(/3) + o p (l), (14) 

where V(/3 ) = J(/3 ) and o p (l) error term is uniform on compact sets. 

Proof: Differentiating Qat(/3), with respect to the kth (k = 1, . . . , q) element [3k of (3 
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yields 



8 Q N ((3) = 2iVg^(/3)C^(/3)%^ + iVg^(/3)^|^g 7V (/3). 



(15) 

From Theorem 1, it follows that at (3 — (3 , V~N g N ((3 ) = O p (l) and dC~^(j3)/d(3 k 
has a finite limit by the strong law of large numbers. Consequently, the second term 
in (15) is O p (l). Therefore, it follows that 

l —VQ N ({3 Q ) = VNVg T N (Po)C^((3 )g N ((3 ) + o p (l) 



2VN 



NB T (f3 ) £^(/3 ) g N (f3 ) + o p (l). 



The last equation follows from the result (9) in Theorem 1 and the result (10). The 
asymptotic distribution of \^Ng N ((3 ) given in (8) yields the result (13). 

Similarly, 

1 92 Q N (P) = 2g£(/3) C N \(3) ^|# + g^(/3) d l C A { f ] 6*09) 



As N — > oo, by the strong law of large numbers, each of these terms is converging to 
a non-degenerate limit, which in turn lead to result (14) for an appropriate V(/3). At 
(3 = /3 , all the terms involving gjy(/3) converge to zero, leaving only the third term: 

1 d 2 Q N (f3 ) dg T N ((3 ) ~, dg N ({3 ) 

In matrix form, this can be restated as 

!^W 2 Q N (f3 )=3(f3 ) + o p (l) (16) 
which establishes the result (14) at j3 = (3 , implying that V(/3 ) = J(/3 ). ■ 
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From Theorem 4, we have the following corollary. 

Corollary 1. The first and second derivatives of Q^(f3) at /3 = (3 satisfy 
VQ N (f3 ) = O p (VN) and V 2 Q N ((3 ) = O p (N), respectively 

The analysis of the first derivative vector as well as the second derivative matrix of 
Qn{@>) are critical to the development of asymptotic theory for the QIF and numer- 
ical algorithms to find the QIF estimators. However, several inadequacies exist in 
the previous results provided by QLL. First, their derivation involves multiplication 
of three and four-way arrays which are not clearly defined, leading to an incorrect 
expression for the second derivative matrix, V 2 Qn{(3 ). The claims made by QLL 
about convergence of the first and second derivatives of Qn{(3) contradict the results 
shown in Corollary 1. These problems in QLL lead to their claim on p. 835 that 
[ijj — -i/> ) and (A — Ao) have a limiting normal distribution, however, with a missing 
a factor of \J~N . 

4 Testing for General Linear Hypotheses within 
the QIF Framework 

In this section we first establish the quadratic approximation of the QIF in a local 
neighborhood of (3 . Next, we derive an asymptotic distribution of the test based on 
the QIF for testing a general linear hypothesis and demonstrate that the Theorem 1 
of QLL becomes a special case of our result. 

4.1 Asymptotic Distribution of the Inference Functions 

The fundamental principle underlying the large-sample results presented in this article 
is a quadratic approximation of the QIF in a local neighborhood of the true regression 
parameter vector f3 . Under this approximation, the problem of minimizing the QIF 
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is asymptotically equivalent to a generalized least squares criterion. Consequently, 
standard asymptotic results from linear models can be applied to the QIF framework. 

Definition: A ball of radius o(v / iV) is defined as {£ : ||£|| < r N }, where {r N : N > 1} 
is a sequence of constants with r N = o(yN). 

For exposition, we define 

ZN := 2~7n WQn{(3o) - (17) 

Theorem 5. [Quadratic approximation of the QIF] For a fixed g-dimensional vector 
£, the following representation holds: 

Qn (P + N-$ = Q N ((3 ) + 2 (£, Z N ) + f J(/3 ) £ + 0p (l) (18) 

as iV — > oo, where (•, •) is the vector inner product and the o p (l) term is uniform for 
£ in a ball of radius = o(\* r N). 

Proof: The Taylor series expansion yields 

Q N (j3 + N~* e) = Q N (P ) + 2 (£, Z N ) + 7^f V 2 Q N (f N ) & 

where 0* N lies between (3 and ((3 Q + N~ 1 / 2 £) for each JV. From the uniformity result 
(14), it follows that 

^ 2 Qn (/3] v )=v(/3j v )+o p (l). 

As N — > oo, /9]y — > /3 . From the continuity result of Theorem 4, it follows that 
V(/3jv) — >• V(/3 ) = J(/3 ), yielding the desired result. ■ 

Corollary 2. The quadratic approximation in Theorem 5 can be expressed as 

Q N (/3 + AT* |) = {Z N + J(/3 ) £} r J- x (/3o) {Z^ + J(/3 ) £} 

+ Q A ,(/3 )-Z^ J" 1 ^) Z^ + o p (l). 
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The representation in the above corollary establishes that the QIF is asymptoti- 
cally equivalent to a generalized least squares criterion. This simplifies derivation of 
large-sample results, since known properties of the weighted least squares will hold 
asymptotically for the QIF. Second, it leads to an IRGLS algorithm for finding the 
QIF estimator (3 N . 

The minimizer £* N of the quadratic approximation in (18) is given by 

Since Z^ has a limiting distribution, it follows that £* N lies in the ball of radius 
with probability converging to 1. This result, combined with the uniformity of the 
error term in (18), yields the following corollaries. 

Corollary 3. Let £ N be the minimizer of Qn (f3 + iV" 1 / 2 ^), then (3 N = ((3 Q + 
N ~ l/2 In)- Equivalently, 

$ N = -J-\(3 )Z N + o p (l) 

and 

P N = (3 - N~? J- 1 ^) Z N + o^N- 1 / 2 ). (19) 
Corollary 4. The asymptotic distribution of f3 N is given by 

VN(p N -P } ^N q {0,J~\f3 )} asiV-^oo. 

Corollary 5. The following result holds: 

Qn0n)=Qn(P o )-Z n J-\f3 )Z N + o p (l) as TV - oo. (20) 

4.2 Asymptotic Distribution of a Generalized QIF Test Statis- 
tic 

In this section, we derive the asymptotic theory for testing a general linear hypotheses. 
Consequently, the one presented in QLL becomes a special case. 
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Following the notation in Christensen (2002), we consider the problem of testing the 
general linear hypothesis 

H :A T (3 = b versus H x : A T (3 ± b, (21) 

where, for some p < q, the (qxp) matrix A imposes p linearly independent constraints 
on the parameter vector f3 and constant vector b e 

The QIF-based test statistic for testing the general linear hypothesis problem (21) is 

T N :=Q N N )-Q N N ), (22) 

where the unrestricted and constrained minimizers of Qn(P) respectively, are 

P N = arg min Q N (P) 
peB 

and 

P N = arg min Q N (/3). 

Theorem 6. [Null asymptotic distribution of Tjv] For testing the hypothesis problem 
(21), the QIF-based test statistic has the asymptotic representation 

T N = {A T J- x (/3 ) Z N } T {A T J-\(3 ) A}" 1 { A T J^ 1 ^) Z N } + o p (l), (23) 

where J(/3 ) is defined in (12). The asymptotic distribution of T/v under the null 
hypothesis H is 

Tn —> Xp as N —> oo, 

where p is the number of linearly dependent constraints imposed by the matrix A. 

Proof: Suppose that the null hypothesis H is true, then A T /3 = b. Note that 
£ = y/N (P - P ) implies A T £ = VN(A T p - A T p ) = \?~N ( A T P - b). Therefore, 
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minimizing the QIF in (18) subject to A T (3 = b is equivalent to minimizing it subject 
to A T £ = 0. Following arguments similar to those in the previous section, we obtain 

&v = A, - N-i J" 1 ^) [i - A { \ T J-\(3 ) A}' 1 \ T J-'W] Z N + o^N- 1 ' 2 ) 



and 



Qn(Pn) - Qn(P c 



+ {A T J- 1 ^) Z N f {A T J-^o) A}- 1 {A T J- 1 ^) Z^} 
-^J-^Zy + ^l). (24) 



Combining the results (20) and (24), we obtain (23). Equation (13) implies that 
A t Ztv has an asymptotic N p {0, A T J _1 (/3 ) Aj distribution. The asymptotic x\ 
distribution of T/v as N — > oo follows immediately. ■ 



4.3 Testing Under Local Alternatives 

We consider the hypothesis testing problem (21), but assume that the alternative 
hypothesis is true. Specifically, consider a sequence of local alternative parameter 
vectors (3 N = (/3 + A r ~ 1 / 2 #), where A T /3 = b and i? is a fixed g-dimensional vector. 

In order to establish the large-sample properties of the test statistic T/v under this 
model, we can proceed essentially as before with the exception that \/N g N (f3 ) has 
non-zero mean. The multivariate central limit theorems become, respectively 

and 

Z N = -±= VQ N (f3 ) -±> N q {-J(f3 )O,J(f3 )} 

as N — > oo. The asymptotic representation (23) continues to hold under the local 
alternatives; therefore, we have the following asymptotic distribution for T/v. 

Theorem 7. [Asymptotic distribution of T/v under local alternatives] Under f3 N — 
(f3 + N~ 1 / 2, &), the asymptotic distribution of the test statistic T/v is non-central 
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chi-squared with a non-centrality parameter defined as 

S 2 := tf T A {A T J-\f3 ) A}" 1 A T tf. (25) 

Example: QLL partitioned the regression parameter vector as f3 T = (iJj t , A T ) and 
considered testing the hypothesis H : ij) = tp . This corresponds to A T = (I 0) and 
b = t/> in the hypothesis problem (21). 

The result of Theorem 7 is applicable if we partition the asymptotic covariance matrix 
of f3 N as 

J(/3o)= \ Mo 3 ^° X °) , 
\ J AoV„ J AoAo/ 

where (3 = (i/> , Ao) is the null value of (3 = (tf),X). From standard results for the 
inversion of a partitioned matrix, the non-centrality parameter can be expressed as 

5 2 = -d T A (J AA - J^ Ao J- A A^. 

This agrees with the result in QLL, subject to the concerns over the scaling of J^^p 
discussed earlier. 

5 The IRGLS Algorithm 

In this section, we derive the iteratively reweighted generalized least squares (IRGLS) 
algorithm for finding the QIF estimator of (3. The necessity for such an algorithm is 
illustrated first using a simulated experimental data. 

We assumed ten subjects {i = 1, . . . , 10) and four observations per subject (t = 
1, ...,4) under an AR-1 correlation structure with autocorrelation p = 0.5. We 
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constructed the extended score vector g N (f3) using Mi 



I and 



10 



10 10 



M 2 



(26) 



10 1 



\0 1 0J 



as the basis matrices. The fitted models are = (3q + (3\ (t — 2.5) and 



log 



(1 - A*«) 



/3 o + 1 (t-2.5) for i = l,...,10;t = 1 



respectively, for the Gaussian and Bernoulli responses. Let (3 = (/5 ,A) T - Figure 
1 displays the QIF surface Qn(/3) for simulated correlated data generated from the 
Gaussian and Bernoulli distributions, respectively. Notice the strikingly different 
behavior under these two models. For the correlated responses from the Gaussian 
distribution, the QIF surface is bounded above by N = 10 and converges to 10 as 
\\(3\\ — > oo in any direction. Even in such a scenario, the Newton-Raphson algorithm 
can diverge. In the Bernoulli case, the surface has multiple ridges, valleys as well 
as local minima as — > oo in some directions. It is clear that, carefully designed 
algorithms are necessary to reliably find the global minimum of Qn((3). 

The QIF surface plots amplify the necessity for the development of a stable algorithm 
for finding the QIF estimator of (3. The Newton-Raphson algorithm, recommended 
by QLL, requires accurate starting values to converge, especially in situations resem- 
bling that of Figure 1(b). Furthermore, in order to implement the Newton-Raphson 
algorithm, we need to find the matrix V 2 Qn{(3) which can be a computationally 
daunting task even for small N. 

IRGLS Algorithm: The equation (19) forms the basis of our algorithm. 
Step 1. Start with an initial value of the parameter vector f3^\ 
Step 2. Find the updated value for (3 via 




for j = 1, 2, . . ., 
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Figure 1: The surface plot of Q N (f3), where f3 = (/3 , Pi) T , under the AR-1 correlation 
structure for the correlated (a) Gaussian responses and (b) Bernoulli responses. 
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where 

J N (J3) := Vg T N (f3) C^(/3) Vg N ((3). (27) 

If the above iterative scheme converges to a limit /3°°, then the limit must be a 
stationary point satisfying VQat(/3°°) = 0. An S-Plus library implementing this 
IRGLS algorithm is developed by Loader and Pilla (2005b). 

The IRGLS algorithm proposed here inherits the standard advantages of the IRLS 
methods (Green, 1984) over the Newton-Raphson algorithm: (1) it avoids the com- 
plexity of computing \7 2 Qn{@); and (2) the algorithm is guaranteed to move in a 
descent direction of the QIF surface. This second property ensures that the algo- 
rithm cannot converge to a local maximum. With simple bounds on the step size, 
the IRGLS algorithm converges to the QIF estimator from almost any starting point 
(Loader and Pilla, 2005a). 

6 Analysis of Respiratory Health Effects Data 

We analyze part of the longitudinal binary data on respiratory health effects of indoor 
and outdoor air pollution in six U.S. cities measured on 537 children at ages 7 to 10. 
One of the interests of the study is to determine the effect of maternal smoking on 
the children's respiratory illness. Laird et al. (1984) considered the data collected on 
children from Ohio and treated the maternal smoking habit as fixed at the first visit. 
The response is binary indicating the presence or absence of respiratory illness. The 
maternal smoking habit, in the preceding year, is recorded as a binary covariate. The 
mean response is modeled as a function of Age, Smoking habit and the interaction. 
One of the goals of this study was to assess the effect of maternal smoking on chil- 
dren's respiratory illness. Note that measurements observed on each child are serially 
correlated. 
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We fit the following logistic model to the binary data 

log { (i^y } = Po + PlXa + hXa + fa XiiXi2 

for % = 1, . . . , 537 and t = 0, . . . , 4, where Xn and X i2 are the time-independent 
covariates for the age of the child and maternal smoking habit, respectively. The 
matrix Aj is diagonal with elements v(fii t ) = /^(l — [/,&)■ The extended score vector 
gjv(/3) is constructed by choosing s — 2, Mi = I and M2 as in (26). 

The standard errors of f3, denoted by s((3), are computed as the square root of 
diagonal elements of J^ r 1 (/3), where Jjv(0 is defined in (27). Table 1 presents the 
estimators (3 and their corresponding standard errors s(f3), obtained via the QIF 
library (Loader and Pilla, 2005a). The t-ratios suggest that Age is the significant 
covariate to include in the model. The t-ratio for age has a negative sign indicating 
that older children are less likely to have a respiratory illness, whereas mother's 
smoking habit has a positive effect on children's respiratory disease, although not 
statistically significant. The interaction between the age of the child and maternal 
smoking is also not statistically significant. 

Table 1: The parameter estimators and corresponding estimated standard errors for 
the Respiratory Health Study under the AR-1 Correlation Structure. 



Covariates (3 s((3) t-ratio 

Intercept -1.89404 0.11903 -15.91226 

Age -0.12933 0.05671 -2.28062 

Smoke 0.26384 0.18962 1.39144 

Age x Smoke 0.06070 0.08791 0.69048 



In order to assess whether the sub-models are adequate, we compute the QIF under 
various sub-models with certain parameter restrictions to perform chi-square tests for 
comparing different models. 
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Each row of the Table 2 represents results for a given model and the last row provides 
the full model. For each model, we compute the parameter estimate (3 and report 
the corresponding Qn{(3). The test statistic TJv is obtained via (22) which compares 
with the full model. The "df" column is the degrees-of-freedom for the test statistic 
and P is the P- value. From the table, the models "Intercept" and "Intercept, Smoke" 
are rejected (P < 0.05). The remaining models that include the Age variable cannot 
be rejected. 

Table 2: Testing of hypotheses for the longitudinal data on children's respiratory 
disease. The column Qn (j3^ is minimum of the QIF obtained under the submodel, 
Tn is the value of the test statistic, df is the degrees of freedom and P is the p-value 
for the test statistic. 



Covariates 


Qn (3) 


Tn 


df 


P 


Intercept 


11.898 


7.926 


3 


0.048 


Intercept, Smoke 


10.337 


6.365 


2 


0.041 


Intercept, Age 


5.823 


1.851 


2 


0.396 


Intercept, Smoke, Age 


4.449 


0.477 


1 


0.490 


Intercept, Smoke, Age, Age x Smoke 


3.972 










Remark 3. The discrepancy between our results and those of QLL is appar- 
ently due to their use of undocumented modifications of the covariance estimators 
of cov{g A r(/3)}. The results in Table 2 are based on minimization of Qn{(3) as de- 
fined in (6). 
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7 Assessing the Effect of Basis Matrices on the 
QIF Test 



In this section we investigate the effect of the choice of basis matrices, defining the 
extended score vector g iV (/3), on the performance of the QIF test. The basis matrices 
are often unknown in advance; hence, it is necessary to assess the effect of their 
misspecification. We calculate the size and power of the QIF test based on T/v through 
the simulated correlated data from Bernoulli and Gaussian distributions. 

The QIF test statistic is robust to the choice of basis matrices, in the sense that 
the null asymptotic chi-squared distribution of T/v is valid whether or not the basis 
matrices Mj (j = l,...,s) are correctly specified. However, misspecification may 
adversely affect the power of the test, since the asymptotic covariance matrix J~ 1 (/3 ) 
defined in (12) [consequently, the non-centrality parameter 5 2 in (25)] depends on the 
true covariance matrix of f3 . 

The following is the trade-off for misspecification of the basis matrices: (1) If too few 
basis matrices are included in the model, then the estimator of f3 may not be efficient; 
consequently leads to a loss of power of the test based on T/y. (2) If too many basis 
matrices are specified in the model, then the dimensionality of the extended score 
vector g N (f3) increases. This can lead to numerical instability while affecting the 
power of the QIF test based on T/v as more components of Cat(/3) are being estimated. 

We conducted two simulation experiments each with N = 50 subjects and n = 5 
observations per subject to investigate the effect of the basis matrices on the QIF- 
based inference. 

Let AR-1 refer to representing R~ 1 (a) = («i Mi + a 2 M 2 ) and AR-2 refer to ex- 
pressing R~ 1 (o:) = («i Mi + «2 M2 + «3 M3), where Mi = I (the identity matrix of 
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dimension 5), 



M, 



1 (A 
10 10 
10 10 
10 1 

o o o i oy 



and 



M, 



1 0^ 

1 

1 1 
10 

o o i o oy 



Binary Correlated Responses: For each subject, binary responses Y it for i = 1, . . . , 50 
and t = 1, . . . , 5 were generated according to the following two-state Markov chain 
with the transition matrix 



P 



1 \ I I- fit ^ 

p) 



1/ V 1 - Hi Hi j 

where p>i is defined in (28) below. The response vector Yu has the stationary distri- 
bution ^1 — fa and the AR-1 correlation structure with autocorrelation p. We 
fit the following logistic model to the binary data 

Pi 



log 



Pi 



Po + (3i Xi for % = 1, . . . , 50, 



(28) 



where the covariate Xi is chosen to be equally spaced on the interval [—1,1]. 

For each simulation experiment, the QIF test statistic T/v, defined in (22), for Hq : 
f3\ = versus H\ : fti ^ was compared with the critical value 3.8415, based on the 
95th percentile of the xi distribution. We chose the following true parameters for the 
simulation experiment: j3 = /3i = under H and @ = 0,@i = 0.5 under Hi. 

Gaussian Correlated Responses: The same design and parameter values were used 
for this simulation experiment; however, we fit the following model to the continuous 
data 

Y it = A) + PiXi + e it for i = 1, . . . , 50; t = 1, . . . , 5, 

where for each i, eu is assumed to be a Gaussian AR-1 process with variance 1 and 
correlation p. 
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Table 3: Achieved significance level and power under three different assumed correla- 
tion structures for the QIF test based on Tjy. Results are based on 10,000 replications 
under the true AR-1 correlation structure with autocorrelation p. 



Model 



p Level of Significance 



Power 



Identity AR-1 AR-2 Identity AR-1 AR-2 





0.2 


0.048 


0.048 


0.048 


0.473 


0.463 


0.447 


Logistic 


0.5 


0.050 


0.049 


0.049 


0.325 


0.327 


0.322 




0.8 


0.047 


0.050 


0.048 


0.226 


0.228 


0.227 




0.2 


0.050 


0.047 


0.048 


0.968 


0.954 


0.933 


Gaussian 


0.5 


0.044 


0.046 


0.044 


0.843 


0.822 


0.795 




0.8 


0.052 


0.050 


0.051 


0.644 


0.633 


0.601 



Table 3 presents the simulation results under the logistic and Gaussian models, re- 
spectively. Under each scenario, we achieve significance levels close to the nominal 
level of 5%, while the power decreases as the correlation p increases. However, for 
a fixed p, there is a minimal difference between the powers attained under the three 
different correlation structures. In particular, there is essentially no power loss when 
we assumed (albeit incorrectly) the identity correlation structure. 

To investigate further, we estimated the non-centrality parameter based on (25) under 
different scenarios by finding 



averaged over all 10,000 replications, where (3\ = 0.5 is the slope parameter under 
H u A T = (0 1) and J^(-) is defined in (27). 

Table 4 presents the 5 2 values along with the power of the QIF test calculated under 
the theoretical asymptotic non-central chi-squared distribution of T^. These results 




2n 



demonstrate that the model with a misspecified identity correlation structure yields a 
slightly smaller 5 2 and correspondingly slightly lower power. However, this difference 
is not reflected by the finite-sample simulation results presented in Table 3. 

Table 4: Estimated non-centrality parameter 5 2 values and powers for the QIF test 
based on T/v under three different assumed correlation structures. Results are based 
on 10,000 replications under the true AR-1 correlation structure with autocorrelation 

P- 



Model 


P 










Power 








Identity 


AR-1 


AR-2 


Identity 


AR-1 


AR-2 




0.2 


4.122 


4.286 


4.437 


0.528 


0.544 


0.558 


Logistic 


0.5 


2.452 


2.609 


2.678 


0.347 


0.365 


0.373 




0.8 


1.431 


1.515 


1.517 


0.223 


0.234 


0.234 




0.2 


17.982 


19.263 


20.478 


0.989 


0.992 


0.995 


Gaussian 


0.5 


11.067 


12.189 


12.961 


0.914 


0.937 


0.950 




0.8 


6.837 


7.582 


8.071 


0.744 


0.786 


0.811 



The conclusion from the simulation experiments is that not much is lost when only 
the identity correlation structure is assumed. This does not mean that correlation 
structure is not important, but rather that correlation is being adequately modeled 
through the empirical covariance matrix Cn(/3) even under the misspecification of 
the basis matrices. 
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8 Conclusions 



The QIF is a powerful tool for building regression models for correlated data. The 
large-sample properties of the inference functions are similar to those of the loglikeli- 
hood in parametric statistical inference with test statistics based on the QIF having 
asymptotic chi-squared distributions. As shown in Section 1.3, the covariance ma- 
trix for the extended score functions defining the QIF employed by QLL can lead to 
breakdown of the asymptotic theory. 

In this research, we established a unified large-sample theoretical framework for the 
QIF. First, we formulated an accurate estimator for the covariance of the extended 
score functions gjv(/3) and second, we derived relevant asymptotic results necessary 
for the estimation and testing within the QIF setting. The key principle underlying 
our asymptotic treatment is the quadratic approximation in Theorem 5. The conse- 
quences of this approximation are wide-ranging, providing the necessary machinery 
for deriving large-sample theory for the QIF estimators and test statistics, analogous 
to standard inferential theory for the generalized least squares criteria, while leading 
to a stable and flexible algorithm for finding the QIF estimators. Our simulation 
experiments demonstrate that the QIF test statistic Tjy is robust to the choice of 
basis matrices, in the sense that the null asymptotic chi-squared distribution of T N 
holds true even under the misspecification of the basis matrices. 
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